Patent 

Attorney's Docket No. 030650-073 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In re Patent Application of 



Rainer KLISCH et al. 



Group Art Unit: 2641 



Application No.: 09/964,381 



Examiner : Unas signed 



Filed: September 28, 2001 



For: 



METHOD AND DEVICE FOR 
ANALYZING A SPOKEN SEQUENCE 
OF NUMBERS 



CLAIM FOR CONVENTION PRIORITY 



Assistant Commissioner for Patents 
Washington, D.C. 20231 

Sir: 

The benefit of the filing date of the following prior foreign application in the following 
foreign country is hereby requested, and the right of priority provided in 35 U.S.C. § 119 is hereby 
claimed: 



In support of this claim, enclosed is a certified copy of said prior foreign application. Said 
prior foreign application was referred to in the oath or declaration. Acloiowledgment of receipt of 
the certified copy is requested. 



European Patent Application No. 00121468.3 
Filed: September 29, 2000 



Respectfully submitted. 



Burns, Doane, Swecker & MAxms, l.l.p. 



Date: November 2/1-, 2001 




Kenneth B. Leffler 
Registration No. 36,075 



P.O. Box 1404 

Alexandria, Virguiia 22313-1404 
(703) 836-6620 



(03/01) 



5 



THIS PAOI BLANK lUSFTO) 




Europaisches 
Patentamt 



European 
Patent Office 



Office europeen 
des brevets 



scheinigung Certif i cate 



Attestation 



Die angehefteten Unterla- 
gen stimmen mit der 
ursprOnglich eingereichten 
Fassung der auf dem nach- 
sten Blatt bezel chneten 
europaischen Patentanmel- 
dung uberein. 



The attached documents Les documents fix6s a 
are exact copies of the cette attestation sont 
European patent application conformes a la version 
described on the following initialement deposee de 
page, as originally filed. la demande de brevet 

europeen specifiee k la 
page suivante. 



Patentanmeldung Nr. Patent appiication No. Demande de brevet n*" 

00121468.3 



Der Prasident des Europaischen Patentamts; 
tm Auftrag 

For the President of the European Patent Office 

Le President de roffice europeen des brevets 
P.O. 




I.L.C. HATTEN-HECKMAN 



DEN HAAG,DEN 

THE HAGUE, 04/09/01 

LA HAYE3LE 



EPA/EPO/OEB Form 1014 -02.91 



THIS PAGE SLANK (uspto) 




Europaisches 
Patentamt 



European 
Patent Office 



Office europeen 
des brevets 



Blatt 2 der Bescheinigung 
Sheet 2 of the certificate 
Page 2 de rattestation 



Anmeldung Nr.: 
Application no.: 
Demande n*: 



00121468.3 



Anmeldetag: 



Date of filing: 29/09/ 00 
Date de depot: 



Anmelder 
Applicant(s}: 
Demandeur(s}: 



TELEFQNAKTIEBOLAGET LM ERICSSON (publ) 
126 25 Stockholm 
SWEDEN 



Bezeichnung der Erfindung: 
Title of the invention: 
Titre de I'invention: 

Method and device for analyzing a spoken sequence of numbers 



In Anspruch genommene Prioriat{en) / Priority(ies) claimed / Priorite(s) rBvendiquee(s) 

Staat: Tag: Aktenzelchen: 

State: Date: File no. 

Pays: Date: Numero de depot: 



Internationale Patentklassiftkation: 
International Patent classification: 
Classification internationale des brevets: 



Am Anmeldetag benannte Vert ragsta ate n: 

Contracting states designated at date of filing: AT/BE/CH/CY/DE/DK/ES/FI/FR/GB/GR/IE/IT/LI/LU/MC/NLyPT/SE/TR 
Etats contractants designes lors du depot: 

Bemerkungen: 
Remarks: 
Re marques; 



EPA/EPO/OEB Form 1012 -11.00 



G10L15/04, G10L15/26 



THIS PAGE BLANK (uspto) 



— 



Telefonaktiebolaget LM Ericsson (publ) 
P13726 



- 1 - 



EP-85 112 



EPO- Munich 
48 

2a Sep. 2000 

Method and Device for Analyzing a Spoken Sequence of Numbers 



5 BACKGROUND OF THE INVENTION 

Technical Field 

The invention relates to a method and a device for analyzing a 
10 spoken sequence of numbers. 



Discussion of the Prior Art 



A lot of technical applications require recognition of a spoken 
15 sequence of numbers • Many mobile telephones comprise the fea- 
ture of voice dialing by uttering a telephone number. Moreover, 
electronic commerce applications require the recognition of 
spoken order numbers and spoken credit card numbers. 

20 WO-A-89 04035 discloses a method for recognizing a number like 
a telephone number consisting of a plurality of digits. The 
digits are uttered singly or in sequences. Two utterances com- 
prising one or more digits may be separated by the user-defined 
placement of pauses. A pause time between two utterances is 

25 monitored and when an utterance is followed by a pre-determined 
pause time interval^ the recognized digits will be replied via 
a speech synthesizer. A further utterance comprising one or 
more digits can then be started, and only the next utterance 
will be replied after a subsequent pause. 

30 

While recognition of spoken digits and spoken digit sequences 
works reliably also under adverse noise conditions, automatic 
recognition of naturally spoken numbers like "twenty two" or 
"five hundred thirty" is more difficult. This is due to the 
35 fact that spoken sequences of numbers like "twenty two" or 
"five hundred thirty" can stand for more than one numerical 
value. The spoken sequence of numbers "twenty two", for exam- 
ple, can stand either for the single numerical value "22" or 
for the two numerical values "20" and "2". As another example. 
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the sequence "five hundred thirty" can stand both for the nu- 
merical value "530" or for the two numerical values "500" and 
"30". 

When automatically recognizing a spoken sequence of numbers, 
the recognition process becomes increasingly difficult if num- 
bers with a large numerical value or a large sequence of num- 
bers have to be analyzed. Thus, the spoken sequence of nximbers 
"thousand four hundred fifty six" can stand for a single nu- 
merical value or for up to five numerical values. Altogether, 
there exist eight possibilities: "1456", "lOOO" and "4" and 
"100" and "50" and "6", "1000" and "456", "1000" and "400" and 
"56", "1000" and "400" and "50" and "6", "1400" and "56", 
"1400" and "50" and "6", "1450" and "6". 

These ambiguities do not only occur in the English language. In 
the German language , for example, the naturally spoken se- 
quence of numbers "einhundert zehn" can stand both for the sin- 
gle numerical value "110" and the two numerical values "100" 
and "10". However, the ambiguities relating to the one or more 
numerical values of a spoken sequence of numbers may be differ- 
ent in different languages. While e. g. in the French language 
"quarante sept" can stand for both the single numerical value 
"47" or the two numerical values "40" and "7", this ambiguity 
does not occur in the German language. In the German language 
the numerical value "47" is spoken as "siebenundvierzig" and 
the sequence of the two numerical values "40" and "7" is spoken 
as "vierzig sieben". 

There is, therefore, a need for a method and device for analyz- 
ing a spoken sequence of numbers which allow a robust distinc- 
tion between different semantic interpretations with respect to 
the one or more numerical values comprised therein. 
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SUMMARY OF THE INVENTION 

The present invention satisfies this need by providing a method 
5 for analyzing a spoken sequence of numbers, wherein the numbers 
are recognized by automatic speech recognition and wherein the 
method comprises determining a pause length between two con- 
secutive numbers and deciding whether or not the two consecu- 
tive numbers belong to a single numerical value on the basis of 

10 the determined pause length. A device for analyzing a spoken 

sequence of numbers comprises an automatic speech recognizer, a 
prosodic unit for deteirmining a pause length between two con- 
secutive numbers and a processing unit for deciding whether or 
not the two consecutive numbers belong to a single numerical 

15 value on the basis of the determined pause length. 

According to the invention, the speaking pause length between 
two consecutively spoken numbers is used as the single prosodic 
criterion or as one of a plurality of prosodic criteria for as- 

20 sessing whether or not the two consecutively spoken numbers be- 
long to a single numerical value or to two different numerical 
values. The speaking pause length is a robust prosodic crite- 
rion for analyzing a spoken sequence of nuiabers. Further pro- 
sodic parameters apart from the speaking pause length on which 

25 the decision whether or not two consecutively spoken numbers 

belong to a single numerical value can be based are known from 
E. Nothet al "Prosodische Information: Begrif f sbestimmung und 
Nutzen fiir das Sprachverstehen" , in Paulus, Wahl (ed.), Muster- 
erkennung 1997, Informatik aktuell, Springer-Verlag, Heidel- 

30 berg, 1997, pages 37-52, herewith incorporated by reference. 

The decision whether or not two consecutively spoken numbers 
belong to a single numerical value can be a "hard" decision or 
a "soft" decision. The "hard" decision can be based on deter- 
35 mining whether or not certain thresholds of prosodic parameters 
have been exceeded. A "soft" decision may be made by means of a 
so-called classifier, e.g. a neuronal network, which takes into 
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account a plurality of prosodic parameters and which produces 
e.g. a propability decision. 

According to a preferred embodiment of the invention, it is 
5 automatically decided that two consecutive numbers do not be- 
long to a single numerical value if a certain pause length 
threshold is exceeded. Such a mechanism corresponds to the 
acoustical perception of a human listener. The two spoken num- 
bers "20" and "2" e. g. will clearly be perceived by the human 

10 listener as two separate numerical values (i. e. "20" and "2") 
if a speaking pause of sufficient duration is made between 
speaking the nximbers "20" and "2". On the other hand, the spo- 
ken numbers "20" and "2" will be perceived as a single numeri- 
cal value (i. e. "22") if no or almost no speaking pause is 

15 made . 

The speaking pause length threshold which foirms the basis for 
the decision whether or not two consecutive numbers belong to a 
single numerical value can initially be set to a certain value. 

20 This value can be an empirical value estimated on the basis of 
a representative speech database. The pause length threshold 
can also be adjustable. This allows a user to adapt the speak- 
ing pause length threshold to his own manner-of -speaking, e. g. 
by changing the threshold value in system settings of the de- 

25 vice. 

It has been found that robust setting of a pause length thresh- 
old is strongly interrelated with speech tempo which in turn 
depends on the individual speaker. In reality, the speech tempo 

30 of different speakers can vary within a wide range. According 
to a preferred embodiment of the invention, the pause length 
threshold is therefore automatically adapted to the current 
user's speaking habit. This can e. g. be done by analyzing pre- 
viously determined speaking pause lengths within one or more 

35 previously uttered numerical values which the user has already 
acknowledged to be correct. A new pause length threshold can 
then either be set to the mean or the median computed over 
these previously determined speaking pause lengths or it can be 
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set anywhere between the old threshold and the mean or median 
value of the previously determined speaking pause lengths. In 
other words: the pause length threshold is shifted* 

5 The decision whether or not two consecutively spoken numbers 
belong to a single numerical value can be made more robust if 
the decision is not only based on the speaking pause length but 
also on the previously mentioned further prosodic parameters 
apart from the speaking pause length. These further prosodic 

10 parameters can relate to a phoneme duration like phrase-final 
lengthening or pre-boundary lengthening, the shape of the en- 
ergy contour or specific pitch movements like phrase-final 
fall* Preferably, respective thresholds are also provided for 
these further prosodic parameters. The decision whether or not 

15 two consecutive numbers belong to a single numerical value can 
accordingly also be based on the criterion whether or not a re- 
spective threshold of a further prosodic parameter has been ex- 
ceeded . 

20 Like the pause length threshold, the respective thresholds of 
further prosodic parameters can be user- adjustable or be auto- 
matically adjusted dependent on the user's speaking habit or be 
adjusted in accordance with appropriate training data. Moreo- 
ver, previously determined further prosodic parameters of pre- 

25 viously uttered numerical values which the user has already 

acknowledged to be correct can be used for shifting respective 
thresholds of the prosodic parameters. 

In many languages, connecting words between two consecutive 
30 numbers of a spoken sequence of numbers indicate that the two 
consecutive numbers belong to one numerical value. In the Eng- 
lish language, e. g. , such a connecting word is the word "and". 
Thus, the spoken sequence of nimbers "one hundred and ten" usu- 
ally stands for the numerical value "110", even if the total 
35 pause length between "hundred" and "ten", the pause length be- 
tween "hundred" and "and" or the pause length between "and" and 
"ten" exceeds a previously set pause length threshold. 
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In order to correctly analyze a spoken sequence of numbers com- 
prising one or more connecting words between two consecutive 
numbers, a preferred embodiment of the invention comprises the 
feature of recognizing such a connecting word. According to a 
5 first variant of the invention, it is determined that two con- 
secutive numbers belong to a single numerical value every time 
a connecting word is arranged between the two numbers. 

According to a second variant, upon recognition of a connecting 
10 word between two consecutive numbers, the pause length 

threshold for determining whether or not the two consecutive 
numbers belong to a single numerical value is changed. In other 
words: upon recognition of a connecting word, the decision 
whether or not two consecutive numbers belong to a single nu- 
15 merical value is based on a different pause length threshold as 
in case no such connecting word is recognized. Consequently, 
two different pause length thresholds are utilized. Analyzing a 
spoken sequence of numbers thus becomes more robust because in 
certain cases the consecutive numbers belong to different nu- 
20 merical values although a connecting word is arranged therebe- 
tween, especially in cases where the pause length between the 
two consecutive ntimbers is extremely long (e. g. when a user 
places long pauses between the connecting word and the number 
preceding or following the connecting word) . 

25 

There exist several possibilities for determining a speaking 
pause length between two consecutive numbers of a spoken se- 
quence of numbers. The pause length can e. g. be directly de- 
termined by measuring a silence interval between two 

30 consecutively spoken numbers. This can be done with a so-called 
voice activity detector. A speaking pause length can also be 
determined indirectly using the information obtained as a by- 
product from the process of automatic speech recognition. Dur- 
ing automatic speech recognition not only the words themselves 

35 but also their respective start and end points on a time axis 
are computed. The pause length can thus be determined based on 
an end point of the first of two consecutive numbers and a 
starting point of a second of two consecutive numbers • Espe- 
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cially in noisy environments, this technique usually leads to 
more robust results than measuring a silence interval between 
two consecutive numbers • 

BRIEF DESCRIPTION OF THE DRAWINGS 

Further aspects and advantages of the invention will become ap- 
parent upon reading the following detailed description of pre- 
ferred embodiments of the invention and upon reference to the 
drawings in which: 

Fig. 1 is a schematic diagram of a device for analyzing a 
spoken sequence of numbers according to the inven- 
tion; and 

Fig. 2 is a schematic diagram of a method for analyzing a 
spoken sequence of numbers according to the inven- 
tion. 

DESCRIPTION OF THE PREFERERD EMBODIMENTS 

In Fig. 1, a schematic diagram of a device 100 for analyzing a 
spoken sequence of numbers according to the invention is illus- 
trated. The analyzing device 100 depicted in Fig. 1 comprises 
an automatic speech recognizer 120, a prosodic unit 140 for de- 
termining a pause length between two consecutive numbers, a 
processing unit 160 for deciding if the two consecutive numbers 
belong to a single numerical value and an input unit 180. 

Upon speaking a sequence of numbers like "five hundred thirty", 
the automatic speech recognizer 120 recognizes each of the spo- 
ken numbers as well as connecting words comprised within the 
spoken sequence of numbers. During the recognition process, the 
starting and end points in time of the recognized numbers and 
connecting words are computed. These starting and end points 
are output to the prosodic unit 140 for determining the pause 
length between two consecutive numbers or between a connecting 
word and a preceding or subsequent number. 
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The processing unit 160 receives input from both the automatic 
speech recognizer 120 and the prosodic unit 140. Based on the 
numbers recognized by the automatic speech recognizer 120, the 
5 presence of connecting words between two consecutive numbers 

and the pause length between two consecutive numbers or a con- 
necting word and a number preceding or following the connecting 
word, the processing unit 160 analyzes the spoken sequence of 
numbers with respect to the one or more nixmerical values con- 
10 tained therein. 

The processing unit 160 decides whether or not two consecutive 
nimbers belong to a single nximerical value on the basis of a 
pause length threshold. This pause length threshold is ini- 
15 tially set to a value between 100 ms and 1 s, preferably to a 
value of 200 ms. 

By means of an input unit 180 a user has the possibility to 
adapt this initial threshold to his own manner-of -speaking. The 
20 input unit 18 0 comprises a graphical or physical slide bar al- 
lowing to adjust the threshold within a predetermined range. 
The input unit 180 also allows selection of an automatic adap- 
tation of the threshold to the speaking habit of one or more 
users of the device 100. 

25 

The function of the device 100 is hereinafter described in more 
detail with reference to Fig. 2. 

First of all, a pause length threshold 0 is set automatically 
30 or by the user or according to appropriate training data to a 

certain value. Then, the user speaks the sequence "five hundred 
thirty" consisting of the three numbers "five", "hundred" and 
"thirty". These spoken numbers are subjected to automatic 
speech recognition in the automatic recognizer 120. The auto- 
35 matic speech recognizer 120 recognizes the three numbers 

"five", "hundred" and "thirty" with their respective starting 
and end points. The detection of the respective starting and 
end points indicates that there is a first pause between the 
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first number "five" and the second number "hundred" and a sec- 
ond pause between the second number "hundred" and the third 
number "thirty". 

The starting and end points of the three numbers are input to 
the prosodic unit 140 which determines a pause length PI of the 
first pause as well as a pause length P2 of the second pause. 
The three numbers recognized by the automatic speech recognizer 
120 and the two pause lengths PI and P2 determined by the pro- 
sodic unit 140 are input to the processing unit 160 which de- 
cides if two consecutive numbers belong to a single nvimerical 
value on the basis of the measured pause lengths PI and P2 . 

If both the pause length PI and the pause length P2 exceed the 
pause length threshold 0, the processing unit 160 decides that » 
the spoken sequence of numbers contains three numerical values, 
i. e. "5", "100" and "30". If neither of the two pause lengths 
PI and P2 exceeds the pause length threshold 0, the processing 
unit 160 decides that the spoken sequence of numbers contains a 
single numerical value, i. e. "530". 

If the processing unit 160 determines that only the first pause 
length PI exceeds the pause length threshold 0, it decides 
that the spoken sequence of numbers contains the two numerical 
values "5" and "130". On the other hand, if only the second 
pause length P2 exceeds the pause length threshold 0, the 
processing unit 160 decides that the spoken sequence of numbers 
contains the two numerical values "500" and "30". 

According to the method depicted in Fig. 2, the pause length PI 
is determined prior to the pause length P2 . This allows to ana- 
lyze the spoken sequence of numbers in the order the numbers 
are spoken. Of course, the pause lengths PI and P2 may also be 
determined and analyzed in a different order. This may necessi- 
tate that all numbers of the sequence of numbers have to be 
spoken prior to the analyzing step. 
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Although the method depicted in Fig- 2 relates to a decision 
which is solely based on the determined pause length, the pro- 
sodic unit 14 0 depicted in Fig- 1 may also deteirmine further 
prosodic parameters apart from the pause length and the deci- 
sion may also be based on these further prosodic parameters. 
Besides, the automatic speech recognizer 120 may also recognize 
connecting words within a spoken sequence of nvuabers and the 
processing unit 160 may, upon recognition of a connecting word, 
apply a different threshold regarding the one or more prosodic 
parameters on which the decision is based. Also, the decision 
can be based solely on one or more prosodic parameters apart 
from the pause length. 

The device 100 and the method according to the invention may be 
used for many applications, e. g. stationary electronic com- 
merce systems or mobile applications like mobile telephones. 



5 
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A method for analyzing a spoken sequence of numbers recog- 
nized by automatic speech recognition, comprising: 

- determining a speaking pause length between two consecu- 
tive numbers; and 

- deciding whether or not the two consecutive numbers belong 
to a single numerical value on the basis of the determined 
pause length* 

2. The method according to claim 1, further comprising deter- 
mining one or more further prosodic parameters apart from the 
pause length and deciding whether or not the two consecutive 
numbers belong to a single numerical value based also on the 
one or more further prosodic parameters. 

3. The method according to claim 1 or 2, wherein the decision 
is based on a threshold of the pause length and/ or of the one 
or more further prosodic parameters. 

4. The method according to claim 3, wherein the threshold is 
initially set to an empirical value. 

5. The method according to claim 3 or 4, wherein the 
threshold is user-adjustable. 

6. The method according to claim 3 or 4, wherein the 
threshold is automatically adjusted dependent on a user's 
speaking habit or dependent on appropriate training data. 

7. The method according to one of claims 2 to 6, wherein the 
threshold of the pause length and/ or of the further prosodic 
parameters is shifted on the basis of one or more previously 
determined pause lengths and/ or previously determined further 
prosodic parameters relating to one or more correctly deter- 
mined numerical values. 
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8. The method according to one of claims 1 to 7, wherein the 
pause length is determined by measuring a silence interval be- 
tween two consecutive numbers. 

5 

9 . The method according to one of claims 1 to 7 , further com- 
prising obtaining an end point of a first of the. two consecu- 
tive numbers and a starting point of a second of the two 
consecutive numbers during automatic speech recognition and de- 

10 termining the pause length based on the end point and the 
starting point. 

10. The method according to one of claims 1 to 9, further com- 
prising recognizing a connecting word within the spoken se- 

15 quence of numbers. 

11. The method according to claim 10, wherein, upon recogni- 
tion of a connecting word, the decision whether or not two con- 
secutive numbers belong to a single numerical value is based on 

20 a different pause length threshold. 

12. A device (100) for analyzing a spoken sequence of nxambers 
comprising: 

an automatic speech recognizer (120) ; 
25 - a prosodic unit (140) for detemmining a speaking pause 
length between two consecutive numbers; and 
a processing unit (160) for deciding whether or not the 
two consecutive numbers belong to a single numerical value 
on the basis of the determined pause length. 

30 

13. The device according to claim 12, wherein the prosodic 
unit (140) determines one or more further prosodic parameters 
apart from the speaking pause length and wherein the processing 
unit (160) decides whether or not the two consecutive nxambers 

35 belong to a single numerical value based also on the one or 
more further prosodic parameters. 
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14. The device according to claim 12 or 13, wherein the auto- 
matic speech recognizer (120) recognizes a connecting word be- 
tween the spoken sequence of numbers. 
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Method and Device for Analyzing a Spoken Sequence of Numbers 

5 A method for analyzing a spoken sequence of numbers recognized 
by automatic speech recognition comprises determining the 
speaking pause length between two consecutive numbers and de- 
ciding if the two consecutive numbers belong to a single nu- 
merical value on the basis of the determined pause length. A 
10 device for analyzing a spoken sequence of nvaabers comprises an 
automatic speech recognizer, a unit for determining the pause 
length between two consecutive numbers and a processing unit 
for deciding if the two consecutive numbers belong to a single 
numerical value on the basis of the determined pause length* 

15 



(Fig- 2) 



THIS PAGE BLANK (usfto) 



felefoTTakliebolaget LM Ericsson (publ) 



7/2 



EP-85 112 




"five hundred thirty" 



five 



Automatic Speech 


Recognition 


thirty 




P2 




hundred 




PI 





Determination of 
pause lengths P1,P2 




Figure 2 



500-30 



530 



